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Abstract 

We introduce a new version of dynamic time warping for samples of ob- 
served event times that are modeled as time-warped intensity processes. Our 
approach is developed within a framework where for each experimental unit or 
subject in a sample, one observes a random number of event times or random 
locations. As in our setting the number of observed events differs from sub- 
ject to subject, usual landmark alignment methods that require the number of 
events to be the same across subjects are not feasible. We address this chal- 
lenge by applying dynamic time warping, initially by aligning the event times 
for pairs of subjects, regardless of whether the numbers of observed events 
within the considered pair of subjects match. The information about pairwise 
alignments is then combined to extract an overall alignment of the events for 
each subject across the entire sample. This overall alignment provides a useful 
description of event data and can be used as a pre-processing step for subse- 
quent analysis. The method is illustrated with a historical fertility study and 
with on-line auction data. 



Keywords: Alignment, Birth data, French- Canadian fertihty. Functional data 
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1 Introduction 



Time warping is commonly used in functional data analysis to address the pres- 
ence of random variation in time in addition to amplit ude variation. A majo r diffi- 
culty is the non-identifiability of these two components (ILiu and Miillerll2004j ). This 
non-identifiability requires that one makes strong assumptions on the nature of the 
warping mechanism in order to be able to distinguish it from the variation in the 
random amplitudes. While various solutions have emerged over the years, a gold 
standard for functional time warping is the landmark method (IKneip and Gasser 



1992 



Gasser and Kneipl 119951 ). In this method, one typically extracts for each ran- 



dom curve Xi in a sample of i.i.d. trajectories Xi, . . . ,Xn a series of landmarks, 
which correspond to the times or locations where certain features occur, such as the 
location of a zero or the location of a peak. 

One then uses these locations to represent the sample or for subsequent continu- 
ous time warping of the random functions, by moving the landmark locations to their 
cross-sample average locations. Then one can smoothly map the time intervals in 
between these locations, for example by applying monotone spline transformations. 
Usually, the landmarks are obtained by presmoothing the often noisily observed func- 
tions Xi, obtaining estimates Xi and then extracting features such as peak locations 
by substituting the unknown peak location 6i = argmax^Xj(x) by the estimated 
peak location 9i = ar gmax^ Xj (x) , where a peak is just one of many possible features 
defining a landmark (j Gasser et al.lll984bl ). 

A persistent difficulty with the landmark method has been that in real data 
applications, landmarks such as peaks can be hard to identify for specific sample 
curves, due to either noise in the measurements or to the presence of non-standard 
trajectories, which do not exhibit some of the landmarks or have more landmarks. In 
growth curves, for example, one might encounter more than one mid-growth spurt 
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and it is then hard to decide how to position extra peaks within the sequence of 
landmarks. The landmark method however requires that the number of identified 
landmarks matches across all curves and also that the landmarks are in a sequence 
that is equally interpretable across all subjects in terms of features (location of peaks 
and of zeros and size of peaks for functions and derivatives). In addition, although 
seemingly straightforward, the computational task of landmark extraction is actually 
quite burdensome and usually cannot be fully automated, so that identifying and 
verifying a large number of landmarks even for mod est samples of curve s can be a 
major effort that includes some subjective elements (IGasser et al.lll984al ). 

For these reasons, non-landmark based methods for function warping have met 



with increasing interest in recent years in the functiona. 
( 



Silverman 



2004 



1995 



Wang and Gasser 



Kneip and Ramsay 



2008; 



1997 



Aach and Church 



Telesca and Inoue 



data analy sis hterature 
R0nJ 



2001 



2001 



Gervini and Gasser 



20081 ). As already mentioned, such 



methods depend explicitly or implicitly on strong assumptions to circumvent the non- 
identifiability problem. In this paper, we take a different approach. We assume from 
the beginning that the number of landmarks that can be identified in sample trajec- 
tories is random, but that nevertheless their order matters, so that the lower ranked 
landmarks should be placed closer together in time than the higher ranked land- 
marks, irrespective of the number of landmarks that are available for each subject. 

More generally, the times at which landmarks occur are viewed as random event 
times that are observed for each subject. These event times carry information about 
their order and relative location to each other, but otherwise are not distinguish- 
able. Typical examples for such event data that we use to illustrate our methods are 
age at child birth per woman in a large French-Canadian historical cohort, where 
women typically have a large number of offspring, and the timings of bids submitted 
at online e-bay auctions, recorded for each auction in a sample of auctions. Such 
data can be viewed as realizations of random intensities of underlying point pro- 
cesses. Related methods for time synchronization for densities have been studied in 



Zhang and Miil" 



to 



Bouzas et al. 



mom : 



eri (12011); for general function al methodology for intensities we refer 



Wu and Zhang! tom . 



To model such data, denote the rii observed event times for the i-th subject 
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by tji, . . . , and assume that all event times are observed within the interval 
T= [0,T]. We consider standardized cumulative incidence functions 

and assume these are generated by an underlying monotone increasing fixed function 
fi and warping functions hi, which are i.i.d. realizations of a random variable H 
taking values on >V(T) = {/ G C(T) | / strictly increasing, /(O) = 0, /(T) = T}, 
such that 

Yi{tij) = jj,{hr\tij)), tijeT, i = l,...,n, j = l,...,ni. (1) 

Here, fi{t) is a cumulative distribution function and the constraints on H ensure 
that fi{h~^(t)) also are (random) cumulative distribution functions. In this frame- 
work, registration procedures are defined by estimates of the registration functions 
hi. The determination of the time warping functions hi is of intrinsic interest and 
also allows to determine unwarped versions Xi{t) = Yi{hi{t)), t G T of the 1^. For 
identifiability purposes, it is expedient to assume that E[if(t)] = /r(^), where Ij- is 
the identity function on the domain T. 



2 Methodology 

2.1 Pairwise registration 



For p airwise registration, we adopt the approach proposed by iRong and Miiller 



The idea is to obtain global alignment for a sample of random functions, 
from information about pairwise alignments. Using this device, the task of global 
registration is reduced to the often more manageable task of constructing pairwise 
or "relative" alignments between pairs of random trajectories. 

The initial goal is thus to obtain pairwise synchronizati ons for all or many pairs 



of curves Yi and Yj, as defined in ([T]). For this purpose, iRong and Miillerl ( 120081 ) 
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define pairwise synchronization functions 



9At) = hj{K\t)), 



(2) 



wfiicli provide a transformation of the time scale of the curve Yj towards that of Yi, 
and show that K[hj{h^^{t))\h~^{t)] = h^^{t). This motivates the estimators 



(3) 



The alignment problem is then reduced to the problem of defining appropriate 
estimators gji{t), that can t ake into account t h e par ticular nature of the data, in 



Rong and Miillerl ( 2008 ) proposed minimization of a 



our case event data. While 
penalized L^-distance to obtain appropriate estimates of gji{t), in the situation we 
study here, dynamic time warping (DTW) provides a promising alternative. 



2.2 Dynamic time warping 

Dynamic time warping (DTW) is a series of alignment algo rithms originally devel- 



Kruskal and Liberman 



oped in the 1970s in the context of speech recognition (see 
(119831 ) for an overview). These are dynamic programming algorithms that, given 
a certain metric, consist on aligning two sequences of outcomes by minimizing the 
total distance between them, computed as the sum of distances between each pair 
of points along the aligned positions in a suitable metric. Dynamic time warping is 
similar to the algorithms used for the alignment of biological sequences, such as the 
Needleman-Wunsch algorithm (INeedleman and Wunschlll970l ) for global alignment. 

In this context, given two sequences a = (ai, . . . , a^) and b = (6i, . . . , 6^), with 
values on a certain feature space A, an alignment of a and b is a sequence {efcjfcii of 
variable length \e\, taking values on {(0, 1), (1, 0), (1, 1)} such that YlkLi = (^, 
That is, {Lr,Mr) = X]fc=i^fc5 ^ = defines a path from (0,0) to (£, m) in 

the two-dimensional integer lattice. We say that aj is aligned to bj if there exists a 
1 < r < |e| such that Yll:=i = (hj)- Thus in dynamic time warping one allows one 
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element in one sequence to be aligned to more than one element in the other sequence, 
which makes this method suitable for sequences of event times with unequal number 
of events in each sequence. The general approach is illustrated in Figure [H 
In our setting of event data, we observe pairs of curves a = {Yi{tii), . . . , 
and b = (Yj(tji, Yj(tjn^) at event times t = {tn, tin^) and s = {tji, tj^j), 
respectively. Then, to align curves Yi and Yj one needs to determine, first, which 
elements of the sequence a should be mapped to which elements of the sequence b, 
and second, the time points at which these aligned values should to be placed. For 
the first determination, a discrete DTW approach, which is described next, is used. 
The second determination is addressed in Section 12.31 




Dynamic time-warping interpretation: 



(2l 0,2 CL3 0-4 0-5 
Alignment: 
£ = {(1,1),(1,0),(1,0),(1,1),(1,1),(0,1)} 



ai a2 as 04 as 

\l/ I /\ 
bi 62 63 h 



Figure 1: Graphical representation of a dynamical time warping alignment between 
two sequences a = (ai, . . . , a^) and b = (61, ... , bm)- Lines represent correspondence 
between elements of the two sequences. 



From now on and in the following, we use the notation a = (ai,...,a^) and 
b = (61, ... , bm) to refer to the observed values, without any temporal reference, of 
any pair of curves. We now describe the DTW algorithm that allows us to find an 
optimal mapping between these two sequences. 

Let d : A X A ^ W he a. suitable distance function defined on the feature space. 
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Then, for any alignment e, its alignment distance is given by 



\e\ 



(4) 



fc=i 



where w{-) represents time weights that account for the length of t he alignment. Dif- 



ferent definitions of these weights are possible, even w{-) = 1 (see IWang and Gasser 
(119971 ) for the study of weight functions in t he continuous time DTW prob lem). In 
this work we consider the definition given by Kruskal and Liberman ( 19831 ). w{k) = 
(^Lfe — ^i/fc-i + ■^Mfc — "Sa4_J/2, which is quite standard in the DTW literature. 
Then, the optimal alignment is 



e = arg min Z^^ (a, b) , 



(5) 



where Se^m is the set of all possible alignments of sequences with lengths i and m. 
The dynamic time-warping algorithm used to retrieve Sab is as follows: 

1. Initialize: 

Ai,i) = 0, 1(1,1) = (1,1), 

i5(i,i) = A-i,i + c?(ai,fei) ■ (ti-ti-i)/2, %i) = (l,0), 2 = 2,...,£, 

^(ij) = Dij^i + d{ai, bj) ■ {sj - Sj-_i)/2, J(ij) = (0, 1), j = 2, . . . , m. 



2. Iterate for z = 2 



j = 2, m: 



{U - ti^i + Sj - Sj^i) 



L)(ij)=min< 



-D(i-ij-i) + d{ai, bj)- 
A-ij) + d{a„b,)- ^^' ~ ^"^^ ■n{/(,_i,,)^(0, l)} + Mn{Vi,,)=(0, 1)} 



-D(ij„i) + d{ai,bj)- 



.ll{/(,_M)^(l,0)} + Mll{/(,„i,)=(l,0)} 



(1, 1) if = + d{ai, bj) 



(1,0) if D(^ij) = D^i^ij) + d{ai,bj) 



(tj - ti_i + Sj - Sj-i) 

2 

(ti — ti_i) 



(0, 1) if = D(^ij_i) + d{ai, bj) ■ 



3. dirab{a., b) = -D(£,m)- 

4. Trace back: 

£'\^ab\ = i = i- J = m - /(£,m)(2), /c = le"^! and while z > 1 and 

j > 1 do: 

~ k = k — 1, 

Note that in step 1, the initiahzation -D(i,i) = 0, /(i,i) = (1, 1), forces the two first 
values of the two sequences to be aligned together. In step 2, we prevent having 
a vertical movement followed by an horizontal one, or vice versa, by choosing a 
sufficiently large constant M. This means that if for instance Oj, . . . , aj+fc? k > 
0, are all mapped to bj, then bj+i can not be mapped again to ai+k- As for the 
computational cost, it is 0{i ■ m), which is low, since i and m, the number of events 
per curve, are usually quite small. 

Once we obtain the optimal alignment, e"'', of sequences a and b, what we get 
is a correspondence between values ai, . . . ,ae and hi, ... , bm, without any timing 
reference. That is, at this stage we do not tackle the problem of designating the 
time points at which two (or more) aligned values should be placed. This is a major 
difference between our approach and previous \ york using DTW related algor ithms 



in the context of time warping, as for example IWang and Gasserl ( 119971 . 119991 ). In- 
deed, these works deal with continuous features and time synchronization problems 
in which finding similarities between points i n differ e nt cur ves and assigning temporal 



references cannot be tackled separately. In iBigotI ( l2006l ). a dynamic programming 
algorithm similar to ([6]) is used to find a correspondence between two previously 
identified sets of landmarks from two curves. Then, a penalized least squares min- 
imization problem is solved to estimate the warping functions that register the two 
curves through the alignment of their landmarks. 

Our approach is quite different since we focus on global alignment of the whole 
set of curves for which pairwise alignment maps constitute just a preliminary stage. 
That is, we use DTW to determine the pairwise correspondence of events, and this 
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information about pairwise correspondences is then combined (see Section 12. 3p to 
obtain global alignment, i.e., to register the entire sample. This is a key difference, as 
the main disadvantage of DTW based registration methods is that the generalization 
from two to a larger number of curves is not straightforward. Indeed, extensions 
to the alignment of a larger set of curves usually lie in defining a common set of 
landmarks (from individual landmark vectors or from a template curve) to which 
then the entire sample of curves is aligned. However, this ad hoc approach suffers 
from a loss of information and requires computationally intensive algorithms. 

In the last decade, DTW has been used for the alignment of exp r ession pro- 
files from time-course micro array experiments (see lAach and ChurchI ( 120011 ) and 



Clote and Straubhaarl (120061 )). In this context, a n-dimensional time series, con- 



taining the expression levels of a group of n genes recorded at the same time points 
(and which are assumed to be synchronized), is aligned to another n-dimensional 
time series corresponding to a different microarray experiment of the same n genes. 
That is, DTW is used to perform pairwise alignment of two n-dimensional sequences. 
This is different from using DTW to align n different trajectories. Indeed, although 
multiple dynamic time warping is theoretically possible and algorithmically equiva- 
lent to the pairwise case, it is computationally unfeasible even for a relatively small 
number of sequences, since the computing time is proportional to where n repre- 
sents here the total number of sequences and ^ their length, assuming they all have 
similar lengths. 



2.3 Pairwise dynamic time warping 

The proposed strategy is to combine DTW, to obtain pairwise alignment maps in a 
first step, with pairwise registration. The goal is to extract information about global 
alignment from the pairwise correspondences. 

Then, given a sample of n curves, we aim at estimating gij{t) and gji{t) for any 
i, j = 1, . . . ,n. The observations in model ([T]) are (tjfc, i/ik), k = 1, . . . ,ni, i = 1, . . . ,n, 
where tik is the time at which the k-th event is observed for individual i and yik is 
the observed value of Yi at time tik- 
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However, in what follows we will focus on any given pair of curves yi and y^, 
and for the sake of simplicity will denote the corresponding observations as (tfc, Uik)-, 
k = l,...,ni and {sk,yjk), k = l,...,nj. Applying algorithm (E]) we obtain an 
optimal mapping, of the values yi and yj. But one thing is to decide which 
values on both sequences can be aligned together, and another thing is to decide the 
time points corresponding to the aligned records. However, recall that in our case the 
alignment of pairs of curves is just a preliminary step to arrive at a global alignment 
of the whole sample of curves and is not an objective by itself. From estimated 
pairwise alignments gjiit) and gij{t), we also obtain Qjiitk) = Sh and gij{sh) = tk- 
However, if more than one value in one curve is mapped to the same single value in 
the other curve some additional considerations are needed. 

We now describe the detailed procedure to obtain gij{t) and gji{t) from observed 
sequences {tk, yik), k = 1, . . . ,ni, and {sk, yjk), k = 1, . . . ,nj: 

- Find the optimal alignment between the two sequences applying algorithm 
([6]) with d being the Euclidean distance in M. Denote p = \e'^-^\- 

- (^l:p,M,,) = {ELl4^}.=,,...,,- 

- ai = Ml and a2:n, = {M.,, s = 2,. . .,£, s.t. Ls ^ Ls-i} 
(3i = Li and /32;n, = {Ls, s = 2,...,i, s.t. M, ^ M,_i}. 

- For k = 1 . . . , rij : 

K = ■^Qfe if «fc-i7^afc7^ttfc+i (only one of them if /c = 1 or rii) 

tUH= Sa,-l + J^h, h = 0,...,H-l, with / = ^^^^^±^^^1^ 

if ttfc-l 7^ ttfe = ttfc+1 = ■ ■ ■ = ak+H-1 7^ 0(k+H 

(7) 

- For k = 1 . . . ,nj: 

s*k = tp^ if 7^ /3fc 7^ /3fc+i (only one of them if /c = 1 or rij) 

sU=t,.-l + j^^h, h = 0,...,H-l, with / = ^^£^^±^^^1^ 

if Pk-l 7^ /3/c = Pk+1 = ■ ■ ■ = Pk+H-l 7^ Pk+H 

(8) 
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- Define 

9ji{tk) = ifc, k = l,...,ni gij{sk) = s^, /c = 1, . . . , n^. 



(9) 



Note that expressions ([7]) and ([8]) correspond to an interpolation step in the cases in 
which more than one value of yt has been assigned to the same single value of yj and 
vice versa. In those cases, we spread those values in a certain interval rather than 
bringing them together to the same time point. That is, if values yik, ■ ■ ■ , yi,k+H~i are 
all mapped to yj,^^, instead of defining gji(tk+h) = Sa,,, h = 0, . . . , H — 1, we equally 
distribute these points around Waf, in such a way that the slope of gji between tk and 
tfc+H-i is equal to 6 (where 5 is a parameter that needs to be chosen in advance). 
In this way we guarantee that the estimates gji{t) and gij{t) are strictly increasing. 
This is illustrated in Figure O 

Once we have discr ete time estimate s of qj i(t) and gij{t), for any pair i,j = 



1 . . . ,n, i j , following iRong and Miillerl (120081 ) we define: 



hi\tk) = ^—-y^djiitk), k = l,...,ni, i = l,...,n. (10) 
n — I ^ 

We note here that each estimated warping function, /i"^, is only defined over the grid 
of observed event times of its corresponding curve, The same applies for the esti- 
mated registered curves Xi = yiohi. Then, any further step involving computations 
with either the estimated warping functions or registered curves, such us calculat- 
ing the sample mean of the registered curves for instance, requires an additional 
interpolation step aiming to anchor curves at a common grid of time points. 

As for the computational complexity of the pairwise dynamic time warping algo- 
rithm to produce an alignment of a set of n curves, this is proportional to (2)^^ ~ 
(n ■ £)^, [i standing for some average number of events). This represents a very 
important reduction compared to the dynamic time warping approach for multiple 
alignment, which has a computational cost of C(£"), as we have already mentioned. 
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Figure 2: Left: Blue stars are points (tk,yik) and red stars are points {sk,yjk)- 
Solid black lines represent the alignment obtained by DTW. Imaginary continuous 
trajectories are displayed in dashed lines, although these are not observed in practice. 
Right: Alignment of i/i towards yj. Blue stars are now points (t^ = gji(tk),yik)- 
Points (t2,yi2), {H.ya), (^4,1/24) although all mapped to {s2,yj2), are spread in a 
small interval around S2- 

3 Applications 

3.1 Historical fertility data 

We study fertility from a well-documented 17/18th century cohort of 1877 native 
born French-Canadian women who lived past age 50. This data contain the number 
of children and the age at the different births for each woman in the sample. We 
consider only those women with more than one child; the resulting data consist of 
1810 individuals. For more details on this data set see iMiiller et al] koO'i l and the 
references therein. 

To study the relation between the total number of children and the dynamics 
of the giving birth process, we transform the data to obtain a curve {tik,yik), i = 
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1, . . . ,?T,j for each woman, where tik is the age of the i-th woman at her k-th birth, 
i/ik = k and rii is her total number of children. The time domain is T = [13.7, 50.2), 
units are years. To anchor all sample curves on this interval, we add points (13.7, 0) 
and (50.2,724) at the beginning and at the end of each curve. Since this introduces 
an artificial constant fragment at the end of each curve an important reference is the 
age at the last birth for each woman and not the end of the interval, we force points 
{tim, Vim), ^ = 1, . . . , n, to be mapped together. The whole sample before and after 
registration by the procedure described in Section 12.31 is presented in Figure [31 Note 
that the registered sample includes contracted time domains for some women so that 
the periods between births may be shorter than 9 months. This is a consequence of 
the forced alignment at the first and last births across all women and is also reflected 
in some of the mean group intensity functions (averages of the registered curves 
for women with the same number of offspring) displayed in Figure HI Nevertheless, 
the mean intensity function for the whole sample presents a good approximation of 
the standard fertility process and captures a linear trend along with the mean ages 
at the first and last birth, which are 22.4 and 40.2 years respectively. In the case 
of the sample mean of the original trajectories (Figure HI), these ages are under-, 
respectively, over-estimated. After registration, the estimated warping functions 
can be used for classification purposes. We consider a distance-based approach by 
defining individual distances from the warping functions as follows, 

d{z,j) = {hi\t) - h-\t)ydt, i,j = l...,n. (11) 

Indeed, since the warping functions are strictly increasing and defined from T to T, 
the area between two of these functions is a good indicator of the difference in the 
degree of time distortion that they introduce in the respective individual curves. We 
expect that this distance will be useful at discriminating between different fertility 
dynamics. This distance matrix needs to be computed numerically from the discrete- 
valued warping function estimates. 

Unlike common approaches in functional data analysis, we propose to perform 
clustering directly from the distance matrix without any previous step to reduce 
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Figure 3: Left: Absolute number of children along time for 1810 women. Stars 
represent the observed data, solid lines are obtained by connecting births within 
each record. Right: Absolute number of children versus registered births times for 
1810 women. 
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Figure 4: Left: Mean intensity function before (dashed-dotted line) and after regis- 
tration (solid line). Right: Group mean intensity functions. The groups are defined 
by the women with the same number of children. 
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the dimensionality of the data. The clustering method is a modified version of the 
k-means algorithm in which the centroid of a group G, is calculated as its Frechet 
mean, cg = arg min^^gG T^iyf-c d'^ix.y). To choose the ri umber of clusters we use 
the silhouette coefficient (IKaufman and Rousseeuwi Il990l ) that for each individual 
provides a measure of how well classified it is. The silhouette of individual i is 
(bi—ai) I maxjflj, where is the mean distance of i to the rest of individuals in the 
same cluster and hi is the mean distance of i to the individuals of the closest cluster 
(besides the one it is assigned to). Finally, the silhouette coefficient of a clustering is 
the mean silhouette across individuals. Then, the choice of the number of clusters is 
done by maximizing the silhouette coefficient. For this da ta set, the maximum was 



Kaufman and Rousseeuw 



found for k = 2 clusters, with a value of 0.60. According to 

( 199ol ). a value of this coefficient between 0.51 and 0.70 indicates that a reasonable 
structure has been found in the data. 

The results for k = 2 clusters are displayed in Figure [5l The two clusters can 
be defined as women with late birth trajectories, and women with regular birth 
trajectories. A regular birth trajectory is almost linear, the first birth occurring 
in average at 20.8 years and the last birth at 39.7 years. In contrast, a late birth 
trajectory is piecewise linear with two differentiated pieces, the first one with lower 
slope than the second one. In fact the slope of the second fragment is similar to that 
of a regular trajectory. The average ages at the first and last birth for women in this 
category are 29.4 and 41.8 years. Then, we can say that the main difference between 
the women in the two clusters is the age at which the first child is born, since after 
the first birth, the fertility process looks similar in both cases. If we now look at the 
distribution of the number of children in these two clusters we observe an expected 
consequence, namely, that the total number of children is generally lower for those 
women that had their first child at an older age. 



3.2 Online auction data 



Modelling of price paths in on- l ine auction data has received a lot of attention in 



recent years (jShmueli and Jank 



2005 



Jank and Shmueli 



2006 



Shmueli et al. 



2007; 
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Liu and Miiller 



20081 ). One of the reasons is the availabihty of huge amounts of 



data made pubhc by the on-hne auction and shopping website eBay.com, which 
has become a global market place in which millions of people worldwide buy and 
sell products. The price evolution during an auction can be thought as a con- 
tinuous process which is observed discretely and sparsely only at the instants in 
which bids are placed. In fact, bids tend to concentrate at the beginning and 
at the end of the auction, corresponding to two typically observed phenomena, 
"early bidding" and "bid sniping" (a situation in which "snipers" place their bids 
at the very last moment). Here, we analyze a set of 158 eBay auctions for Palm 
M515 Personal Digital Assistants (PDA), of a fixed duration of seven days, that 
took place between March and May, 2003. This data set is publicly available at 
http: / / ww w.rhsmith.umd.edu / digits /stat istics/data.aspx and h as been previously 



st udied by 
in 



Shmueli and JankI (120051 ) and iLiu and Miillerl ( 120081 ). among others. As 



Liu and Miiller 



we restrict our analysis to 156 auctions after removing two 
irregular recordings. 

Our interest here is not focused on the price process, but rather on the intensity 
functions that quantify the bid arrivals along the time axis, where each bid arrival 
generates an event time. This approach may be useful to understand different bidding 
behaviors and for characterizing the bid arrivals process. A characteristic of these 
data is that bidding trajectories for different auctions are quite dissimilar. While 
most of them reflect very low bidding activity at the beginning of the auction and 
intense bidding when the auction is near its end, in other auctions one observes 
different patterns. It seems reasonable to view these trajectories as accelerated and 
decelerated versions of an underlying bidding intensity function and therefore the 
proposed time warping model seems suitable for this analysis. 

The time domain for observations is T = [0, 168), where the time units are hours. 
As in the previous example, we have added points (0, 0) and (168, rii) at the beginning 
and at the end of each trajectory. The minimum and maximum observed numbers 
of bids for auctions in the sample are 8 and 51 respectively. Applying the algorithm 
described in Section 12.31 with results shown in Figure El indicates that the registered 
(non-standardized) intensity functions show a clear exponential-type pattern. We 
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Figure 6: Left: Observed bidding trajectories Yi for 156 auctions. Stars represent 
the observed data, solid lines are connecting the stars. Right: Registered bidding 
trajectories. 

obtain an "almost common" intensity function irrespective of the number of bids. 
Indeed, the function is the same for all auctions except for the time period after the 
last bid, which is represented by a constant straight line at a height that depends 
on the total number of bids. This time period has been expanded (resp. contracted) 
during the registration process in those auctions with a low (resp. high) number of 
bids. Due to the low variability in the registered sample, the group mean intensity 
functions (Figure [7]) look similar to the registered trajectories. Also in Figured the 
overall mean intensity function after registration is compared to the sample mean 
of the observed curves. We can appreciate how "early bidding" and "bid sniping" 
emerge as phenomena that are due to time warping of the overall mean intensity 
function. 
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Figure 7: Left: Mean intensity function before (dashed-dotted line) and after regis- 
tration (solid line). Right: Group mean intensity functions. The grouping is defined 
by the number of bids. 
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Aiming to differentiate between different bidding behaviors, we apply the warping- 
based clustering procedure described in Section ISTTl Here, again the number of clus- 
ters was found to be /c = 2, with a silhouette coefficient of 0.65. In FigureOwe present 
the warping function estimates, bidding trajectories and final price distributions of 
the two clusters. In fact, we want to investigate any possible relation between the 
bidding dynamics and the closing price of an auction. The clusters are defined by 
auctions with late ( "L" ) and regular-early ( "R-E" ) bidding activity. This is obvious 
if we look at the warping function estimates corresponding to each group. However, 
it is more difficult to tell the differences between the observed bidding trajectories of 
the two clusters. Some of them, although classified in different groups may be close 
together in terms of an distance on the space of observed trajectories. Then, the 
warping-based clustering provides a useful tool for understanding bidding activity. 
The first cluster, "L", is composed by auctions with a very slow start, and contains 
most of the auctions with lowest final prices. Indeed the median final price of cluster 
"L" is slightly lower than that of "R-E" . However it also contains auctions with high 
closing prices (ranging between 240 and 260 $), although these systematically corre- 
spond to auctions with unusually high opening bids (ranging between 50 and 179.99 
$). The second cluster, "R-E", contains both auctions with bidding trajectories that 
are similar to the mean intensity function obtained after registration and auctions 
presenting very intense early bidding activity. The bidding trajectories of the two 
groups seem to present a similar final shape (a very important increase in the bidding 
activity during the last day of the auction) up to a vertical shift. That is, bidders 
behavior during the last day of an auction seems to be the same independently of 
the previous bidding activity until that moment. According to this, one could think 
that the first hours of an auction determine its outcome. However, this may not 
be completely true. In fact, in cluster "R-E" one can distinguish two very different 
starting behaviors, which do not correspond to different price distributions. Indeed, 
if one chooses to divide the data into 3 groups, cluster "R-E" is split into an "R" 
and an "E" cluster for which the closing price distributions are very similar. As 
a conclusion, we could say that it is a very low activity during the first 3/4 of an 
auction that yields to lower closing prices. In contrast, auctions with an early intense 
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bidding activity do not necessary lead to higher final prices than those with a regular 
starting. 



4 Discussion 

We have proposed an efficient registration algorithm for functional data that can 
be characterized by a random number of event times, which could correspond to 
original observations, or alternatively could be derived features from functional data 
such as the location of peaks, which are not consistently observed across the random 
trajectories that constitute the functional data sample. One of the advantages of 
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the proposed method is that it can cope with large sets of curves. Unhke previous 
DTW-based strategies for continuous time curve registration we focus on discrete 
time pairwise "local" alignments that subsequently are combined to obtain a whole 
sample "global" alignment. 

The method relies on the choice of a metric on the observations space, which 
is used to find matching values over different curves. Throughout this article we 
consider the Euclidean distance for this purpose, although any other metric could 
be used. Indeed, this is a crucial point: different metrics will provide different 
alignments. Knowing which values should be aligned together is equivalent to having 
a notion of what similarity means in each particular application. For example, in the 
applications of Section [HI we focus on aligning some particular events, such as the 
birth of the first child, for the fertility data. For this purpose, the Euclidean distance 
proves to be quite suitable. 

As for the clustering of a sample of curves, one may wonder why the distance- 
based method described in Section 13.11 is not directly applied to the observed curves 
instead of the warping estimates. Of course this could be done, but the results will 
be different, due to differences in the distance matrix used for clustering. Here again, 
different similarity notions can be suitable for different purposes. In this context, our 
proposal of clustering based on warping estimates can be seen as a way of defining a 
distance between individuals which focuses on similar temporal behavior. 
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